Introduction and Data







Row

Background

For the past year and a half, COVID-19 has completely disturbed the world’s way of living. Miraculously, scientists developed vaccines in record fashion. The aim of this report is to explore the current situation of the vaccines rollout worldwide and its relationship, if any, to various factors such as number of cases, GDP per capita, and corruption among others.

Research questions

The questions we were interested in are:

  1. How did the ratio of a country’s cases align with its ratio of vaccines administered? And what were the most vaccinated countries?
  2. What were the most and least common vaccines across the countries?
  3. How does GDP per capita influence the level of cases and vaccinations for a country?
  4. The relationship, if any, between vaccination numbers and corruption, and human development index

Row

Data

  1. COVID data

This is a large dataset that contains many different types of information regarding COVID for each country and for each day since the pandemic began. The data set includes case numbers and vaccination numbers, as well as additional measures such as population, GDP, and development index. The dataset is accompanied by a codebook that explains each variable.

This data was retrieved from Our World in Data website

  1. CPI - Corruption perception index

The CPI scores and ranks countries based on how corrupt a country’s public sector is perceived to be by experts and business executives. The CPI is the most widely used indicator of corruption worldwide, it uses a scale of zero to 100, where zero is highly corrupted and 100 is very clean. With the help of CPI, we can see how corruption undermines states’ capacity to respond to emergencies such as dual health and economic crisis delivered by COVID-19.

This data was retrieved from transparency.org

  1. Vaccination data

This dataset consists of datas on the various brands of vaccines available and being used by each country, the amount of vaccination that have taken place on a specific date in a particular country.

This data was retrieved from The World Health Organisation

Limitations:

  • We can only report on which countries possess each brand, however number of vaccinations administered per brand was not explored as no such dataset was located

  • The data was constrained to only April 1, 2021, therefore time series analysis was not performed

Data Cleaning







Row

COVID main dataset

The COVID Data is a dataset that covers a huge aspect of COVID 19. We removed many variables that are unneeded and only kept those that are needed for the analysis such as total cases, total deaths, GDP per capita and several other metrics. We also filtered the date to April 1, 2021.

Dealing with missing value

The COVID Data came with many missing values due to unavailability. To counteract this we removed locations with no cases because they are included in other countries (for example Macao is reported under China), and continent aggregate rows. Some missing values were assumed to be zero, and were imputed as such. Furthermore, we found out through a different datasource missing human development index as well as median age values for some countries which we replaced the missing values with.

Row

Joining the main dataset with CPI dataset

We joined our main covid data file with our corruption price index data file. Next, we discarded the unnecessary variables from the CPI dataset that would not contribute to our questions. Finally, to keep the variable names consistent with good naming conventions we renamed CPI variable.

Joining the main dataset with Vaccination dataset

We further joined our data file from step 4 with our vaccination data file. We then discarded the unnecessary variables from the vaccination dataset that would not contribute to our questions. Next, we separated the different brands of vaccines into separate rows so we could better tackle our research questions. Finally, we did a bit of cleaning on the vaccine brands by removing the space from the beginning of some vaccine brands and renaming the variable name to follow a good naming convention.

Most Vaccinated Countries







Column

Top 5 Vaccinated Countries per Continent

Column

Top Vaccinated Countries

location continent total_vaccinations_per_hundred
Israel Asia 116.31
Seychelles Africa 103.80
United Arab Emirates Asia 84.84
Chile South America 56.46
Bhutan Asia 56.35
United Kingdom Europe 53.44
Malta Europe 46.10
United States North America 45.94
Bahrain Asia 45.56
Maldives Asia 44.54
Serbia Europe 36.57
Hungary Europe 31.26

Most Used Vaccine in the World








Row

Row



What are the numbers like?

Per Continent







Row

Row

Countries with the Most Number of Brands








Row

GDP vs Vaccinations and Cases







column

VACC vs GDP

column

Cases vs GDP

CPI score







Column

CPI vs Vaccination

CPI vs Covid-19 test

HDI







Column

HDI vs Vaccination

HDI vs Covid-19 test

CPI vs Vaccinations







Row

Regression (r2=17.7)

# A tibble: 2 x 5
  term           estimate std.error statistic       p.value
  <chr>             <dbl>     <dbl>     <dbl>         <dbl>
1 (Intercept)      -8.68     2.97       -2.93 0.00387      
2 CPI_score_2020    0.385    0.0628      6.13 0.00000000565

Residual visualisation



Row

Residuals plots

CPI vs Tests







Row

Regression plot (r2=18)

# A tibble: 2 x 5
  term           estimate std.error statistic       p.value
  <chr>             <dbl>     <dbl>     <dbl>         <dbl>
1 (Intercept)      -424.     133.       -3.17 0.00177      
2 CPI_score_2020     17.5      2.82      6.21 0.00000000382

Residual visualisation

Row

Residual plots

Conclusions







As a result of the analysis performed, the following conclusions were reached:

  1. The case rates for a majority of the countries does not match their vaccination rates: high case rates does not result in high vaccination rates and vice versa. This is also reflected on a per continent basis on average. The top five vaccinated countries per continent were also highlight, of note is the very low vaccination rate in Africa excluding Seychelles, as well as the very low rate in Oceania. Israel, Seychelles, UAE, Chile and Bhutan occupy the top five spots worldwide in vaccination rates.

  2. Through the exploration of the different Vaccines used worldwide, it’s become apparent that AstraZeneca is the most trusted vaccine, with it being commonly used throughout the world, even within the separate continents. On the other hand, EpiVacCorona and Anhui ZL are only being used in the country of origin as it is still relatively new compared to the other available vaccines, making it the least used vaccine worldwide.

  3. Since there is little to no relationship between gdp per capita and the level of vaccination, GDP per capita is not a strong explanatory variable in predicting or explaining the level of vaccination. Furthermore, in general, wealthier countries tend to have higher total covid cases.

  4. We can see that less than 40% vaccinated countries are tend to be highly corrupted while top three countries in terms of vaccination level are belong to less corrupted cluster. The number of tests are higher in less corrupted countries while less than 20 people are getting testes in highly corrupted countries. We generated a simple regression analysis on the relationship between corruption ratio and the level of vaccination, the number of tests. It is resulted that the corruption level can explain around 20% of the vaccination level and the number of test as well.

References







Software

---
title: "COVID-19 Vaccines"
author: "T3_Wed_suggrants"
output:
  flexdashboard::flex_dashboard:
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)
```

```{r libraries}
library(tidyverse)
library(naniar)
library(lubridate)
library(plotly)
library(viridis)
library(maps)
library(ggthemes)
library(tidytext)
library(kableExtra)
library(here)
library(ggmap)
library(modelr)
library(broom)
library(janitor)
library(ggResidpanel)
library(DT)
library(flexdashboard)
```

Introduction and Data {data-orientation=rows}
=====================================






Row ------------------------------------- ### Background For the past year and a half, COVID-19 has completely disturbed the world's way of living. Miraculously, scientists developed vaccines in record fashion. The aim of this report is to explore the current situation of the vaccines rollout worldwide and its relationship, if any, to various factors such as number of cases, GDP per capita, and corruption among others. ### Research questions The questions we were interested in are: 1. How did the ratio of a country's cases align with its ratio of vaccines administered? And what were the most vaccinated countries? 2. What were the most and least common vaccines across the countries? 3. How does GDP per capita influence the level of cases and vaccinations for a country? 4. The relationship, if any, between vaccination numbers and corruption, and human development index Row ------------------------------------- ### Data 1. COVID data This is a large dataset that contains many different types of information regarding COVID for each country and for each day since the pandemic began. The data set includes case numbers and vaccination numbers, as well as additional measures such as population, GDP, and development index. The dataset is accompanied by a codebook that explains each variable. This data was retrieved from Our World in Data website 2. CPI - Corruption perception index The CPI scores and ranks countries based on how corrupt a country’s public sector is perceived to be by experts and business executives. The CPI is the most widely used indicator of corruption worldwide, it uses a scale of zero to 100, where zero is highly corrupted and 100 is very clean. With the help of CPI, we can see how corruption undermines states’ capacity to respond to emergencies such as dual health and economic crisis delivered by COVID-19. This data was retrieved from transparency.org 3. Vaccination data This dataset consists of datas on the various brands of vaccines available and being used by each country, the amount of vaccination that have taken place on a specific date in a particular country. This data was retrieved from The World Health Organisation ### Limitations: * We can only report on which countries possess each brand, however number of vaccinations administered per brand was not explored as no such dataset was located * The data was constrained to only April 1, 2021, therefore time series analysis was not performed Data Cleaning {data-icon="fa-table" data-orientation=rows} =====================================





```{r load-data} covid <- read_csv(here::here('data/owid-covid-data.csv')) vax <- read_csv(here::here('data/vaccination-data.csv')) cpi <- read_csv(here::here('data/CPI2020.csv'), skip = 2) ``` ```{r clean-covid} # this dataset comes with a codebook found at: https://github.com/owid/covid-19-data/tree/master/public/data covid_tidy <- covid %>% #deselect unneeded columns select(-starts_with(c('aged', 'hosp', 'icu', 'new', 'tests', 'weekly')), -ends_with(c('rate', 'smokers')), -diabetes_prevalence, -handwashing_facilities, -extreme_poverty, -life_expectancy, -population_density, -stringency_index) %>% #filter date to 01/04/2021 mutate(date = dmy(date)) %>% filter(date == '2021-04-01') ``` ```{r covid-missing-values} #Explore missing values #gg_miss_var(covid_tidy) covid_filled <- covid_tidy %>% #remove continent rows and locations with no cases because they are included in other countries, for example Anguilla is part of UK filter(!location %in% c('International', 'Africa', 'Asia', 'Europe', 'European Union', 'North America', 'Oceania', 'South America', 'World', 'Anguilla', 'Bermuda', 'Cayman Islands', 'Curacao', 'Faeroe Islands', 'Falkland Islands', 'Gibraltar', 'Greenland', 'Guernsey', 'Hong Kong', 'Isle of Man', 'Jersey', 'Macao', 'Montserrat', 'Northern Cyprus', 'Vatican')) %>% #variables with missing values that will be replaced with 0 mutate(across(c(starts_with('people'), total_tests_per_thousand, total_tests, total_vaccinations_per_hundred, total_vaccinations, total_deaths, total_deaths_per_million), .fns = ~replace_na(., 0)), #replace missing human development index with numbers obtained from https://en.populationdata.net/rankings/hdi/ human_development_index = case_when(location == 'Kosovo' ~ 0.787, location == 'Monaco' ~ 0.956, location == 'San Marino' ~ 0.961, location == 'Somalia' ~ 0.364, location == 'Taiwan' ~ 0.907, TRUE ~ human_development_index), #replace missing median age with numbers obtained from https://www.cia.gov/the-world-factbook/field/median-age/country-comparison median_age = case_when(location == 'Andorra' ~ 46.2, location == 'Dominica' ~ 34.9, location == 'Kosovo' ~ 30.5, location == 'Liechtenstein' ~ 43.7, location == 'Marshall Islands' ~ 23.8, location == 'Monaco' ~ 55.4, location == 'Saint Kitts and Nevis' ~ 36.5, location == 'San Marino' ~ 45.2, TRUE ~ median_age), #recode Kosovo ISO to make it consistent across all data files iso_code = recode(iso_code, OWID_KOS = 'KOS')) #gg_miss_var(covid_filled) ``` ```{r join-with-CPI} #left_join with CPI covid_cpi <- covid_filled %>% left_join(cpi %>% #recode Kosovo ISO to make it consistent across all data files mutate(ISO3 = recode(ISO3, KSV = 'KOS')), by = c('iso_code' = 'ISO3')) %>% #remove unneeded variables select(-Country, -Region, -c(Rank:'World Justice Project Rule of Law Index')) %>% #rename CPI variable rename(CPI_score_2020 = 'CPI score 2020') ``` ```{r join-with-vax} #left_join with vax covid_clean <- covid_cpi %>% left_join(vax %>% #recode Kosovo ISO to make it consistent across all data files mutate(ISO3 = recode(ISO3, XKX = 'KOS')), by = c('iso_code' = 'ISO3')) %>% #remove unneeded variables select(-c(COUNTRY:PERSONS_VACCINATED_1PLUS_DOSE_PER100), -FIRST_VACCINE_DATE, -NUMBER_VACCINES_TYPES_USED) %>% #separate vaccines_used column into rows separate_rows(VACCINES_USED, sep = ',') %>% #remove space from beginning of some vaccine names mutate(VACCINES_USED = str_trim(VACCINES_USED)) %>% #rename VACCINES_USED variable to be lowercase rename(vaccines_used = VACCINES_USED) ``` Row ------------------------------------- ### COVID main dataset The COVID Data is a dataset that covers a huge aspect of COVID 19. We removed many variables that are unneeded and only kept those that are needed for the analysis such as total cases, total deaths, GDP per capita and several other metrics. We also filtered the date to April 1, 2021. ### Dealing with missing value The COVID Data came with many missing values due to unavailability. To counteract this we removed locations with no cases because they are included in other countries (for example Macao is reported under China), and continent aggregate rows. Some missing values were assumed to be zero, and were imputed as such. Furthermore, we found out through a different datasource missing human development index as well as median age values for some countries which we replaced the missing values with. Row ------------------------------------- ### Joining the main dataset with CPI dataset We joined our main covid data file with our corruption price index data file. Next, we discarded the unnecessary variables from the CPI dataset that would not contribute to our questions. Finally, to keep the variable names consistent with good naming conventions we renamed CPI variable. ### Joining the main dataset with Vaccination dataset We further joined our data file from step 4 with our vaccination data file. We then discarded the unnecessary variables from the vaccination dataset that would not contribute to our questions. Next, we separated the different brands of vaccines into separate rows so we could better tackle our research questions. Finally, we did a bit of cleaning on the vaccine brands by removing the space from the beginning of some vaccine brands and renaming the variable name to follow a good naming convention. Global Vaccine Trends {data-icon="fa-globe"} =====================================





```{r distinct-locations} covid_dist <- covid_clean %>% distinct(location, .keep_all = TRUE) ``` ```{r load-map} world_map <- map_data("world") %>% mutate(region = recode(region, "USA" = "United States", "Republic of Congo" = "Congo", "Ivory Coast" = "Cote d'Ivoire", "Czech Republic" = "Czechia", "Democratic Republic of the Congo" = "Democratic Republic of Congo", "Swaziland" = "Eswatini", "Micronesia" = "Micronesia (country)", "Macedonia" = "North Macedonia", "Timor-Leste" = "Timor", "UK" = "United Kingdom")) ``` ```{r covid-map} covid_map <- covid_dist %>% left_join(world_map, by = c("location" = "region")) ``` Inputs {.sidebar} -------------------------------------





### Maps * Europe, the US and South America had the highest rates of cases * While the US had many vaccinations in line with their cases rate, the vaccination rate for the majority of the remaining countries is not line with their cases rate (South America for example) ### Graphs * North America, Europe and South America each had very high case rates, while Asia, Africa and Oceania's case rates were very low * The vaccination rates do not match the case rates per continent: South America has much lower vaccination rates than Europe and North America, and is overtook by Asia whose case rates are much lower. Differing GDP and corruption levels between continents generally might be a factor Column ------------------------------------- ### Cases per Million Global Map ```{r cases-map} p1 <- ggplot(covid_map) + geom_polygon(aes(x = long, y = lat, group = group, fill = total_cases_per_million, label = location)) + theme_map() + labs(fill = "Total cases per million")+ scale_fill_viridis(na.value = "white") ggplotly(p1) ``` ### Vaccinations per 100 Global Map ```{r vax-map} p2 <- ggplot(covid_map) + geom_polygon(aes(x = long, y = lat, group = group, fill = total_vaccinations_per_hundred, label = location)) + theme_map() + labs(fill = "Total vaccinations per hundred")+ scale_fill_viridis(na.value = "white") ggplotly(p2) ``` Column ------------------------------------- ### Cases per Continent ```{r cases-graph} covid_dist %>% group_by(continent) %>% summarise(cases = sum(total_cases), population = sum(population), cases_per_million = cases / population * 1000000) %>% ggplot(aes(x = fct_reorder(continent, cases_per_million), y = cases_per_million, fill = continent)) + geom_col() + theme(axis.text.x = element_text(size = 9)) + labs(x = "Continent", y = "Cases per million") ``` ### Vaccinations per Continent ```{r vax-graph} covid_dist %>% group_by(continent) %>% summarise(vaccinations = sum(total_vaccinations), population = sum(population), vaccinations_per_hundred = vaccinations / population *100) %>% ggplot(aes(x = fct_reorder(continent, vaccinations_per_hundred), y = vaccinations_per_hundred, fill = continent)) + geom_col() + theme(axis.text.x = element_text(size = 9)) + labs(x = "Continent", y = "Vaccinations per 100") ``` Most Vaccinated Countries {data-icon="fa-user-md"} =====================================





Column {data-width=700} ------------------------------------- ### Top 5 Vaccinated Countries per Continent ```{r top-countries-by-cont, fig.height=6} covid_dist %>% group_by(continent) %>% arrange(-total_vaccinations_per_hundred) %>% slice_head(n = 5) %>% ungroup() %>% mutate(location = recode(location, "Equatorial Guinea" = "Eq. Guinea", "United Arab Emirates" = "UAE", "United Kingdom" = "UK", "Micronesia (country)" = "Micronesia")) %>% ggplot(aes(x = reorder_within(location, total_vaccinations_per_hundred, continent), y = total_vaccinations_per_hundred, fill = continent)) + geom_col() + scale_x_reordered() + facet_wrap(~ continent, scales = "free") + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none") + labs(x = "", y = "Vaccinations per 100") ``` Column {data-width=300} ------------------------------------- ### Top Vaccinated Countries ```{r top-countries-vax, out.height="100%"} covid_dist %>% select(location, continent, total_vaccinations_per_hundred) %>% arrange(-total_vaccinations_per_hundred) %>% head(12) %>% kable() %>% kable_material(c("striped", "hover")) ``` Most Used Vaccine in the World {data-orientation=rows} =====================================






```{r} covid_clean_nona <- na.omit(covid_clean) ``` Inputs {.sidebar} -------------------------------------






##### Most Used Vaccine * The most used vaccine is AstraZeneca - AZD1222 with 86 countries using it. ##### Least Used Vaccine * The least used vaccine is SRCVB - EpiVacCorona and Anhui ZL - Recombinant with only 1 country using it. Row -------------------------------------- ### Most popular vaccine ```{r, echo = FALSE} mostpopular=covid_clean %>% count(vaccines_used) %>% arrange(desc(n)) %>% select(vaccines_used) %>% head(mostpopular, n =1) valueBox(value = mostpopular, icon = "fa-syringe", caption = "Most used Vaccine", color = "lightgreen") ``` ### Least popular vaccine ```{r, echo = FALSE} leastpopular=covid_clean %>% count(vaccines_used) %>% arrange(n) %>% select(vaccines_used) leastpopular1 <- leastpopular %>% slice(1) leastpopular2 <- leastpopular %>% slice(2) #How to put both the data into value box? valueBox(value = "Anhui ZL - Recombinant & SRCVB - EpiVacCorona", caption = "Least used Vaccines", color = "darkseagreen") ``` Row -------------------------------------

### What are the numbers like? ```{r, echo = FALSE} world <- covid_clean_nona %>% group_by(vaccines_used) %>% mutate(count = n()) g1 <-ggplot(data=world, aes(x=reorder(vaccines_used,-count), fill=vaccines_used)) + geom_bar(stat="count") + theme(axis.text.x=element_blank(), legend.title = element_text(size = 12)) + labs(x = "Vaccine Brands", y = "Number of Countries", fill = "Different Vaccine Brands") ggplotly(g1) ``` Per Continent {data-orientation=rows} =====================================





Inputs {.sidebar} -------------------------------------






##### North America * AstraZeneca - AZD1222 is the most commonly used vaccine in North America, with a country count of 13. ##### South America * South America had one of the most even distribution of brands. * AstraZeneca - AZD1222 is the most used with a country count of 8. ##### Africa * Africa varies from the rest, with SII - Covishield being the most used by 38 countries * AstraZeneca the most used vaccine world wide, was only the 5th most used in Africa. ##### Europe * The top vaccines consisted of AstraZeneca - AZD1222 and Pfizer BioNTech - Comirnaty with both having a country count of 37. * Europe was the only continent to use SRCVB - EpiVacCorona. ##### Oceania * Oceania has a very small data pool compared to the other continents * The most used vaccine is still AstraZeneca - AZD1222 ##### Asia * Asia had the greatest variety in vaccines * The top used vaccine is the Pfizer BioNTech - Comirnaty at 24 countries. * Asia is the only continent to use Anhui ZL - Recombinant. Row -------------------------------------- ```{r, echo = FALSE} North_America <- filter(covid_clean_nona, continent %in% "North America") %>% group_by(vaccines_used) %>% mutate(count=n()) NA_graph <-ggplot(data=North_America, aes(x=reorder(vaccines_used,-count), fill=vaccines_used)) + geom_bar(stat="count") + theme(axis.text.x=element_blank(), legend.title=element_blank()) + labs(x = "Vaccine Brands", y = "Number of Countries", title = "Different Vaccine Brands in North America") ggplotly(NA_graph) ``` ```{r, echo = FALSE} South_America <- filter(covid_clean_nona, continent %in% "South America") %>% group_by(vaccines_used) %>% mutate(count=n()) SA_graph <-ggplot(data=South_America, aes(x=reorder(vaccines_used,-count), fill=vaccines_used)) + geom_bar(stat="count") + theme(axis.text.x=element_blank(), legend.title=element_blank()) + labs(x = "Vaccine Brands", y = "Number of Countries", title = "Different Vaccine Brands in South America") ggplotly(SA_graph) ``` ```{r, echo = FALSE} Africa <- filter(covid_clean_nona, continent %in% "Africa") %>% group_by(vaccines_used) %>% mutate(count=n()) Africa_graph <-ggplot(data=Africa, aes(x=reorder(vaccines_used,-count), fill=vaccines_used)) + geom_bar(stat="count") + theme(axis.text.x=element_blank(), legend.title=element_blank()) + labs(x = "Vaccine Brands", y = "Number of Countries", title = "Different Vaccine Brands in Africa") ggplotly(Africa_graph) ``` Row -------------------------------------- ```{r, echo = FALSE} Europe <- filter(covid_clean_nona, continent %in% "Europe") %>% group_by(vaccines_used) %>% mutate(count=n()) Europe_graph <-ggplot(data=Europe, aes(x=reorder(vaccines_used,-count), fill=vaccines_used)) + geom_bar(stat="count") + theme(axis.text.x=element_blank(), legend.title=element_blank()) + labs(x = "Vaccine Brands", y = "Number of Countries", title = "Different Vaccine Brands in Europe") ggplotly(Europe_graph) ``` ```{r, echo = FALSE} Oceania <- filter(covid_clean_nona, continent %in% "Oceania") %>% group_by(vaccines_used) %>% mutate(count=n()) Oceania_graph <-ggplot(data=Oceania, aes(x=reorder(vaccines_used,-count), fill=vaccines_used)) + geom_bar(stat="count") + theme(axis.text.x=element_blank(), legend.title=element_blank()) + labs(x = "Vaccine Brands", y = "Number of Countries", title = "Different Vaccine Brands in Oceania") ggplotly(Oceania_graph) ``` ```{r, echo = FALSE} Asia <- filter(covid_clean_nona, continent %in% "Asia") %>% group_by(vaccines_used) %>% mutate(count=n()) Asia_graph <-ggplot(data=Asia, aes(x=reorder(vaccines_used,-count), fill=vaccines_used)) + geom_bar(stat="count") + theme(axis.text.x=element_blank(), legend.title=element_blank()) + labs(x = "Vaccine Brands", y = "Number of Countries", title = "Different Vaccine Brands in Asia") ggplotly(Asia_graph) ``` Countries with the Most Number of Brands {data-orientation=rows} =====================================






Inputs {.sidebar} -------------------------------------






##### Vaccine Variety * The Philipines is the country with the most number of different vaccines at 9 different kinds. Row -------------------------------------- ```{r, echo = FALSE} Most <- count(covid_clean_nona, location) %>% arrange(desc(n)) %>% rename("Number of vaccine brands" = n) %>% rename(Country = location) datatable(Most, class = 'cell-border stripe') ``` GDP vs Vaccinations and Cases {data-icon="fa-globe"} =============================





```{r Filtered-Data} covid_clean2 <- covid_clean %>% distinct(location, .keep_all= TRUE) %>% arrange(total_vaccinations) filtered_covid_clean <- covid_clean2 %>% select(location,continent,total_vaccinations_per_hundred, gdp_per_capita,total_cases_per_million) %>% arrange(-total_vaccinations_per_hundred) ``` Inputs {.sidebar} -------------------------------------






VACC vs GDP - Most low GDP per capita countries have zero vaccinations. - Some low GDP per capita countries have low to high level of vaccinations. - high level of vaccination due to donations. - Some positive relationship between GDP_per_capita and total_vaccinations_per_hundred. - The relationship breaks after GDP per capita > 60000 - Conclusion = gdp_per_capita is not a strong explanatory variable on total_vaccinations_per_hundred Cases vs GDP - A certain extent of Linear relationship can be detected. - Wealthier countries tend to have higher total_cases_per_million (based on GDP_per_capita) - Conclusion = the linear relationship is most visible for European countries compared to other countries. column -------------------------------------- ### VACC vs GDP ```{r VACC-vs-GDP} VAC_vs_GDP <-ggplot(filtered_covid_clean, aes(x = gdp_per_capita, y = total_vaccinations_per_hundred, text = location, colour = continent)) + geom_point() ggplotly() ``` column -------------------------------------- ### Cases vs GDP ``` {r Cases-vs-GDP} VAC_vs_CASES <- ggplot(filtered_covid_clean, aes(x = gdp_per_capita, y = total_cases_per_million, text = location, colour = continent)) + geom_point() ggplotly() ``` CPI score {data-orientation=rows} =====================================





Inputs {.sidebar} -------------------------------------






* Less than 40% vaccinated countries are tend to be highly corrupted while top three countries in terms of vaccination level are belong to less corrupted cluster * The most of the countries are vaccinated less than 20% due to the recent vaccine invention * There are numerous countries that reported small number of cases are highly corrupted and haven't started vaccination yet ```{r echo=FALSE} q2 <- covid_clean[!duplicated(covid_clean$iso_code), ] ``` Column {data-width=500} ----------------------------------------------------------------------- ### CPI vs Vaccination ```{r} p1=plot_ly(q2, x= ~ `CPI_score_2020`, y= ~ `total_vaccinations_per_hundred`, color = ~ `continent`, name = ~ `location`, showlegend = FALSE, size= ~`total_cases_per_million`) %>% layout(xaxis=list(title="CPI score"), yaxis=list(title="People vaccinated per 100")) p1 ``` ### CPI vs Covid-19 test ```{r} p2=plot_ly(q2, x= ~ `CPI_score_2020`, y= ~ `total_tests_per_thousand`, color = ~ `continent`, name = ~ `location`, showlegend = FALSE, size= ~`total_cases_per_million`) %>% layout(xaxis=list(title="CPI score"), yaxis=list(title="People tested per 1000")) p2 ``` HDI {data-orientation=rows} =====================================





Inputs {.sidebar} -------------------------------------






* The number of tests are tend to be higher in developed countries * On the other hand, the number of Covid-19 test can be resulted by the infection level Column {data-width=500} ----------------------------------------------------------------------- ### HDI vs Vaccination ```{r} p5=plot_ly(q2, x= ~ `human_development_index`, y= ~ `total_vaccinations_per_hundred`, color = ~ `continent`, name = ~ `location`, showlegend = FALSE, size= ~`total_cases_per_million`) %>% layout(xaxis=list(title="Human development index"), yaxis=list(title="People vaccinated per 100")) p5 ``` ### HDI vs Covid-19 test ```{r} p6=plot_ly(q2, x= ~ `human_development_index`, y= ~ `total_tests_per_thousand`, color = ~ `continent`, name = ~ `location`, showlegend = FALSE, size= ~`total_cases_per_million`) %>% layout(xaxis=list(title="Human development index"), yaxis=list(title="People tested per 1000")) p6 ``` CPI vs Vaccinations {data-orientation=rows} =====================================





Inputs {.sidebar} -------------------------------------






* Simple regression analysis on the relationship between the corruption ratio and the level of vaccination can explain less than 20% of the vaccination level and the number of test as well * It is indicating that there is an improvement room in the model Row {data-height=400} ------------------------------------- ### Regression (r2=17.7) ```{r} mod1 <- lm(total_vaccinations_per_hundred ~ CPI_score_2020, data = q2) tidy(mod1) ``` ### Residual visualisation

```{r} mod_diagnostics <- augment(mod1) var_scatter <- ggplot(q2, aes(x =CPI_score_2020 , y = total_vaccinations_per_hundred)) + geom_point(alpha = 0.4) + geom_smooth(method = "lm", se = FALSE) var_scatter_plotly <- var_scatter + # overlay fitted values geom_point(data = mod_diagnostics, aes(y = .fitted), color = "blue", alpha = 0.2) + # draw a line segment from the fitted value to observed value geom_segment(data = mod_diagnostics, aes(xend = CPI_score_2020, y = .fitted, yend = total_vaccinations_per_hundred), color = "blue", alpha = 0.2) ggplotly(var_scatter_plotly) ``` Row {data-height=600} ------------------------------------- ### Residuals plots ```{r} resid_panel(mod1, plots = "all") ``` CPI vs Tests {data-orientation=rows} =====================================





Inputs {.sidebar} -------------------------------------






* Simple regression analysis on the relationship between HDI and the level of vaccination can explain around 20% of the vaccination level and the number of test as well * It is indicating that there is an improvement room in the model Row {data-height=400} ------------------------------------- ### Regression plot (r2=18) ```{r} mod2 <- lm(total_tests_per_thousand ~ CPI_score_2020, data = q2) tidy(mod2) ``` ### Residual visualisation ```{r} mod_diagnostics2 <- augment(mod2) var_scatter2 <- ggplot(q2, aes(x =CPI_score_2020 , y = total_tests_per_thousand)) + geom_point(alpha = 0.4) + geom_smooth(method = "lm", se = FALSE) var_scatter2_pl <- var_scatter2 + # overlay fitted values geom_point(data = mod_diagnostics2, aes(y = .fitted), color = "blue", alpha = 0.2) + # draw a line segment from the fitted value to observed value geom_segment(data = mod_diagnostics2, aes(xend = CPI_score_2020, y = .fitted, yend = total_tests_per_thousand), color = "blue", alpha = 0.2) ggplotly(var_scatter2_pl) ``` Row {data-height=600} ------------------------------------- ### Residual plots ```{r} resid_panel(mod2, plots = "all") ``` Conclusions =====================================





As a result of the analysis performed, the following conclusions were reached: 1. The case rates for a majority of the countries does not match their vaccination rates: high case rates does not result in high vaccination rates and vice versa. This is also reflected on a per continent basis on average. The top five vaccinated countries per continent were also highlight, of note is the very low vaccination rate in Africa excluding Seychelles, as well as the very low rate in Oceania. Israel, Seychelles, UAE, Chile and Bhutan occupy the top five spots worldwide in vaccination rates. 2. Through the exploration of the different Vaccines used worldwide, it's become apparent that AstraZeneca is the most trusted vaccine, with it being commonly used throughout the world, even within the separate continents. On the other hand, EpiVacCorona and Anhui ZL are only being used in the country of origin as it is still relatively new compared to the other available vaccines, making it the least used vaccine worldwide. 3. Since there is little to no relationship between gdp per capita and the level of vaccination, GDP per capita is not a strong explanatory variable in predicting or explaining the level of vaccination. Furthermore, in general, wealthier countries tend to have higher total covid cases. 4. We can see that less than 40% vaccinated countries are tend to be highly corrupted while top three countries in terms of vaccination level are belong to less corrupted cluster. The number of tests are higher in less corrupted countries while less than 20 people are getting testes in highly corrupted countries. We generated a simple regression analysis on the relationship between corruption ratio and the level of vaccination, the number of tests. It is resulted that the corruption level can explain around 20% of the vaccination level and the number of test as well. References {data-orientation=rows} =====================================





### Data * [Corruption Index](https://www.transparency.org/en/cpi/2020/index/nzl#) * [Covid Data](https://ourworldindata.org/coronavirus-data) + [Codebook](https://github.com/owid/covid-19-data/tree/master/public/data) * [Vaccination Data](https://covid19.who.int/info/) ### Software * [R Software](https://www.R-project.org/) ### Packages * [broom](https://CRAN.R-project.org/package=broom) * [DT](https://CRAN.R-project.org/package=DT) * [flexdashboard](https://CRAN.R-project.org/package=flexdashboard) * [ggmap](https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf) * [ggResidpanel](https://CRAN.R-project.org/package=ggResidpanel) * [ggthemes](https://CRAN.R-project.org/package=ggthemes) * [here](https://CRAN.R-project.org/package=here) * [janitor](https://CRAN.R-project.org/package=janitor) * [kableExtra](https://CRAN.R-project.org/package=kableExtra) * [lubridate](https://www.jstatsoft.org/v40/i03/) * [maps](https://CRAN.R-project.org/package=maps) * [modelr](https://CRAN.R-project.org/package=modelr) * [naniar](https://CRAN.R-project.org/package=naniar) * [plotly](https://plotly-r.com) * [tidytext](http://dx.doi.org/10.21105/joss.00037) * [tidyverse](https://doi.org/10.21105/joss.01686) * [viridis](https://sjmgarnier.github.io/viridis/) ### Misc * [EpiVacCorona](https://www.precisionvaccinations.com/vaccines/epivaccorona-vaccine) * [Recombinant](https://www.thehindu.com/news/international/china-approves-fourth-covid-19-vaccine-for-emergency-use/article34080651.ece) * [Vaccine Donation](https://www.globalcitizen.org/en/content/covid-19-vaccine-donations-around-the-world/)